Learning from Finite Training Sets
نویسندگان
چکیده
{ We analyse online (gradient descent) learning of a rule from a nite set of training examples at non-innnitesimal learning rates , calculating exactly the time-dependent generalization error for a simple model scenario. In the thermodynamic limit, we close the dynamical equation for the generating function of an innnite hierarchy of order parameters using`within-sample self-averaging'. The resulting dynamics is non-perturbative in , with a slow mode appearing only above a nite threshold min. Optimal settings of for given nal learning time are determined and the results are compared with ooine gradient descent. Neural networks have been the subject of much recent research because of their ability to learn rules from examples. One of the most common learning algorithms is online gradient descent: The weights of a network (`student') are updated each time a training example from the training set is presented, such that the error on this example is reduced. In ooine gradient descent, on the other hand, the total error on all examples in the training set is accumulated before a gradient descent weight update is made. For a given training set and starting weights, ooine learning is entirely deterministic. Online learning, on the other hand, is a stochastic process due to the random choice of training example (from the given training set) for each update. It becomes equivalent to ooine learning only in the limit where the learning rate ! 0 1]. In both cases, the main quantity of interest is the evolution of the generalization error: After a given number of weight updates, how well does the student approximate the input-output mapping (`teacher' rule) underlying the training examples? We do not consider in the following non-gradient descent learning algorithms, and also restrict ourselves to gradient descent on the most common measure of error on a training example, the squared output deviation (see eq. (1) below). For interesting recent results on more general, optimized online learning algorithms, see, e.g., 2, 3].
منابع مشابه
On-line Learning from Finite Training Sets in Nonlinear Networks
Online learning is one of the most common forms of neural network training. We present an analysis of online learning from finite training sets for non-linear networks (namely, soft-committee machines), advancing the theory to more realistic learning scenarios. Dynamical equations are derived for an appropriate set of order parameters; these are exact in the limiting case of either linear netwo...
متن کاملOnline Learning from Finite Training Sets: An Analytical Case Study
We analyse online learning from finite training sets at noninfinitesimal learning rates TJ. By an extension of statistical mechanics methods, we obtain exact results for the time-dependent generalization error of a linear network with a large number of weights N. We find, for example, that for small training sets of size p ~ N, larger learning rates can be used without compromising asymptotic g...
متن کاملDynamical and stationary properties of on-line learning from finite training sets.
The dynamical and stationary properties of on-line learning from finite training sets are analyzed by using the cavity method. For large input dimensions, we derive equations for the macroscopic parameters, namely, the student-teacher correlation, the student-student autocorrelation and the learning force fluctuation. This enables us to provide analytical solutions to Adaline learning as a benc...
متن کاملOnline Learning from Finite Training Sets and Robustness to Input Bias
We analyze online gradient descent learning from finite training sets at noninfinitesimal learning rates eta. Exact results are obtained for the time-dependent generalization error of a simple model system: a linear network with a large number of weights N, trained on p = alphaN examples. This allows us to study in detail the effects of finite training set size alpha on, for example, the optima...
متن کاملBackstitch: Counteracting Finite-Sample Bias via Negative Steps
In this paper we describe a modification to Stochastic Gradient Descent (SGD) that improves generalization to unseen data. It consists of doing two steps for each minibatch: a backward step with a small negative learning rate, followed by a forward step with a larger learning rate. The idea was initially inspired by ideas from adversarial training, but we show that it can be viewed as a crude w...
متن کاملEconomical Training Sets for Linear Id3 Learning
Our work is in machine learning, a subfield of artificial intelligence. We describe a variant of the ID3 algorithm [5] which is attuned to the situation that every feature’s value-set is linearly ordered and finite. We then seek economical training sets, that is, ones which are small in size but result in learned decision trees of high accuracy. Our search focuses on geometric properties of the...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1997